Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.
freelancer.com ๐ก 2026-05-08
๐น Scrape book details from multiple sources for ten thousand ISBNs and push data into a Google Sheets workbook.
๐ค Client: ๐ฎ๐ณ Pune, India Member since 2021-05-11
๐ฐ Price: $9 / hr Average bid
๐ฉ Problem: Automate the process of extracting book information from Amazon and external APIs while ensuring robustness against throttling, bot detection, and page format changes.
๐ฆ Existing: Not specified
Specifications:
[Target] Scrape book details for ten thousand ISBNs.
[Method] Use a combination of Scrapy and Playwright with rotating residential proxies to handle Amazonโs throttling and bot checks. Integrate external APIs for additional data sources.
[UI/UX] Not applicable
[Stack] Python, Scrapy, Playwright, Google Sheets API, external APIs (e.g., Open Library)
[Security] Implement rate limiting, use secure proxy management, and ensure data privacy during transmission.
[Format] Cleaned and normalized JSON before pushing to Google Sheets.
Workflow:
1. Set up Scrapy project with custom middlewares for handling Amazonโs throttling and bot checks.
2. Integrate Playwright for dynamic content scraping from Amazon and other external APIs.
3. Implement proxy management using a rotating residential proxy service to avoid detection.
4. Develop a robust data cleaning and normalization module to handle various formats of scraped data.
5. Create a Google Sheets connector that inserts or updates rows atomically, preserving existing formulas.
6. Document the codebase with PEP 8 compliance and provide detailed setup instructions for macOS and Ubuntu.